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Abstract. We show that multiple polynomial ergodic averages arising from 
nilpotent groups of measure preserving transformations of a probability space 
always converge in the L 2 norm. 



1. Introduction 

The purpose of this paper is to prove the following result. 

Theorem 1.1. Let G be a nilpotent group of measure preserving transformations 
of a probability space (X, X, /i). Then, for every T\, . . . ,T; 6 G, the averages 

N d 
n=l j=l 

always converge in L 2 (X,X,n), for every /i,...,/<j G L°° (X, X, fj,) and every set 
of integer valued polynomials Pij . 

This result was conjectured in the present form by Bergelson and Leibman, who 
also showed that even liniAr^oo X^n=i T n fS n g need not exist if T and S only 
generate a solvable group [5]. 

1.1. Historical background. Partial results towards Theorem 11.11 have a rich his- 
tory. Notice that when d = I = 1 and the polynomial is linear it reduces to the 
classical mean ergodic theorem. The only case of Theorem 11.11 which was fully 
settled is that in which T% = ... = T/, that is, when G is a cyclic group. The 
study of this case originated in the seminal work of Furstenberg [TO] on Szemeredi's 
theorem, while a general solution when the polynomials are linear was later pro- 
vided by Host and Kra |14 following the work of several authors (with a different 
proof subsequently found by Ziegler [21] )• Convergence for general polynomials 
was established by Bergelson [5 under the assumption of weakly mixing, while the 
first unconditional non- linear result was obtained by Furstenberg and Weiss [TTJ. 
The general result for cyclic groups and arbitrary polynomials was finally settled 
by Host and Kra [TS] and Leibman |18) . 

Another case of Theorem 11.11 which is known is that in which G is abelian and 
every polynomial is linear. Here, the case d = 2 was proven by Conze and Lesigne 
[S] and assuming extra ergodicity hypothesis on the transformations Zhang [5T] 
gave a proof for d = 3 and Frantzikinakis and Kra jj] for general d. Without these 
assumptions, this result was established by Tao [Tj5] and by now possesses several 
different proofs [5] [T^J [5D]. However, when G is abelian but the polynomials are 
arbitrary, very little was known. It was shown by Chu, Frantzikinakis and Host [7 
that 

1 N 

±^T?^f 1 ...T* l(n) fl (1-2) 

n=l 
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converges whenever the polynomials pi have distinct degrees, but the convergence 
of p.2[) has remained open for arbitrary polynomials. Notice that (|1.2p corresponds 
to taking p,j = whenever i =/= j in Theorem ll.il More generally, very little was 
known until now for convergence of 1 d actions along polynomials. A particular 
result in this direction is the convergence of the averages 



which was established by Austin [3l 0] . 

Finally, when G is only assumed to be nilpotent the results are much scarcer. 
Prior to this paper, it was known by the work of Bergelson and Leibman [S] that 
the averages 



always converge in L 2 , but even in the linear case no convergence result has been 
previously established for more than two transformations. 

1.2. Overview of the proof. Our proof of Theorem 11.11 does not make use of 
the aforementioned results and therefore provides an alternative proof of these 
statements, which in many cases is substantially simpler than the original ones. In 
particular, we do not make use of the machinery of characteristic factors which is 
heavily used in previous literature. The price we pay in doing so is that we do 
not obtain any explicit description of the limits. In this sense, our approach is 
similar to that of Tao |19j . in that we use a weak inverse theorem (see Lemma |3.4[) 
to decompose our functions into the sum of a random component, which is easily 
treatable, and a structured one, which can be handled by an adequate induction. 
Interestingly, we find that our decomposition is best carried out by adapting ideas 
of Gowers related to the Hahn-Banach separation theorem [12] and this is done in 
This is arguably the first time that these ideas are used in a purely ergodic 
theoretical context. 

The main new ingredient of the proof is the concept of an ^-reducible function 
(Definition 13.31) , which will play the role of the structured component of our de- 
compositions. We refer to <j3]for precise definitions, but for now let us discuss what 
these are in the linear abelian case. Here, an L-reduciblc function a with respect 
to a set of transformations T\ , . . . , Tj, is a function for which the behavior of TjV 
can be somewhat recovered from that of the set T"6i, . . . , T^^bj-i, for some pre- 
scribed set of functions hi. This way, the problem of convergence for the set of 
transformations T\ , . . . , Tj is reduced to the analogous question for the smaller set 
Ti, . . . , Tj—x, and one may then proceed inductively. The details of these reductions 
are carried out in 

When either G is not abelian or the polynomials are not linear, the system of 
transformations to which L-reducible functions allow us to pass does not admit such 
a simple expression. In general, it will consist of twice as many transformations 
as the original one and the degree of the polynomials involved may not necessarily 
decrease, so that it may seem that we have not gained much with this procedure. 
As it turns out however, one can define a suitable notion of complexity for every 
set of transformations and show that the above process does indeed lead us to a set 
of lower complexity. The proof that every system of transformations of the type 
studied in Theorem 11.11 reduces in finitely many steps to one consisting only of the 
identity transformation Ix — x is performed in and this completes the proof of 
Theorem O 
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The methods of this paper immediately shield some further convergence results 
and these are discussed in Section ij5] We also include an appendix with several 
examples of how the induction process mentioned in the previous paragraph works 
in some concrete cases. 

Acknowledgments. I would like to thank my advisor Roman Sasyk for a careful 
reading of the manuscript and several helpful discussions. I would also like to thank 
Tim Austin, Vitaly Bergelson and Nikos Frantzikinakis for their useful comments. 

2. Decompositions through the Hahn-Banach theorem 

In this section we will review some of the tools developed by Gowers and adapt 
them to the context of our problem. For a much better discussion of these topics 
the reader is referred to Gowers's paper [12]. We begin by stating the Hahn-Banach 
separation theorem in the form it will be needed. 

Theorem 2.1 (Geometric Hahn-Banach). Let A be an open convex subset con- 
taining of a real topological vector space V and suppose v £ V does not lie in A. 
Then there exists some continuous linear functional <f> : V — > R such that 4>(v) > 1 
and 4>{w) < 1 for every w G A. 

The idea of Gowers to obtain decompositions can roughly be described as fol- 
lows. While it may be difficult to check directly whether an arbitrary function can 
be described by the sum of a structured and a random component, if such a de- 
composition fails to exist an application of the Hahn-Banach theorem would allow 
us to find, since these sets tend to be convex, some large functional which does not 
correlate with random functions (therefore having a kind of structure itself) nor 
with structured functions (therefore also having some randomness). This way, we 
are only left with proving that no object can be random and structured at the same 
time, which generally tends to be an easier task. 

We will concentrate on the study of a real Hilbert space T~L with norm ||-||. In 
order to apply the above scheme, we will need the following corollary. 

Corollary 2.2 (cf. [T^J Corollary 3.2]). Let A%, . . . ,A n be open convex subsets 
containing of some real Hilbert space %. Let c%, . . . , c n > be positive real numbers 
and suppose f S H cannot be written as Y^j=i c jfj "with fj G Aj. Then there exists 
some 4> £ H such that {4>, f) > 1 and (</>, g^) < c^ 1 for every gi G A4. 

Proof. Since the set A := X)"=i c *^« wm ^ e an °P en convex set in T~L containing 
but not /, it follows by the Hahn-Banach theorem that there exists some 4> 6 H 
satisfying ((f), f) > 1 and (0, g) < 1, for every g 6 A. The result follows immediately, 
since Cjgj £ A for every 3 £ A,. □ 

Given a positive real number S and some decreasing function r\ : M + — > R + we 
will consider the sequence C*'' 7 , . . . , C^ s _ 2 -^ defined recursively by 

C$_ al := 1,6^ := max {C^^C 5 *)- 1 } . (2.1) 

We shall also write C s,v := Cf' 17 . These constants will provide the parameters 
for the decomposition obtained below and the fact that they are independent of 
the specific decomposition will allow us to do a priori modifications on our set of 
structured functions so that they are better suited to the resulting bounds. 

Given some norm ||-||jy on tt equivalent to ||-||, we define its dual norm by 

\\f\\*N : = SU P K/.fl)l- 

ll9lU<l 



NORM CONVERGENCE OF NILPOTENT ERGODIC AVERAGES 



4 



Notice that ||-||^- is then also equivalent to ||-||. We will be concerned with the study 
of an infinite family of norms ( || * II jv) jveN mea suring increasing rates of structure and 
for which their dual norms (IMIjv/jveN measure decreasing rates of randomness. As 
it turns out, we will need to work with this large family of norms simultaneously, 
so that if we know one of the components is random at a level A (that is, ||||^ is 
small), we need the other component to be structured at a much higher level B 
(that is, |||| B must be small for some B much larger than A). This is accomplished 
by the following result. 

Proposition 2.3. Let (H'lljv)jVeN ^ e a f am ^y of norms onH equivalent to \\-\\ and 
satisfying \\-\\* N+ i < |H|j\r f or euen/ N. Let < S, c < 1 be positive real numbers, 
rj : K + — > R + some decreasing function and ip : N — > N some function satisfying 
ip(N) > N for all N . Then, for every integer M, > ; there exists a sequence 

M. < Mi < ... < Mpaj-a-] < M* = M .,s,cAl), 

which does not depend on the specific family of norms, with the property that for 
any f € % with \\f\\ < 1, we can find some 1 < i < \26~ 2 ~\ and integers A, B with 
M % < A < cMi < ip(Mi) < B, such that we have the decomposition f = fi + /2 + Z.3 
with 

\\fi\\B<Cf-\\\f 2 \\* A <7 1 (C^),\\f 3 \\<8. 

Proof. Our proof is modeled on the proof of Proposition 3.5 of [T2]. Set A\ := M m , 
Mi := \c~ 1 Ai + 1] and B\ := tp(Mi). If there is no decomposition of the desired 
form with these parameters and C\ :— Cf' 71 we may apply Corollary 12.21 to obtain 
some 0i £ H such that (faj) > 1, ||0i||^ < Cf 1 , II^H^ < ^(Ci)" 1 and H^H < 
5 , where we are using the fact that if \\-\\ N is some norm equivalent to ||-||, then 
{/ G H : \\f\\ N < 1} is an open convex set in H containing 0. 

Recursively, if we cannot find a decomposition with parameters Aj—i, Mj—i, 
Cj-! we set Aj := B^ u Mj := {c^Aj + 1], Bj := ipiMj) and Cj := Cf v . 
If no such decomposition exists with these parameters we can then use Corollary 
12.21 to find some 4>j £ tt with properties analogous to the ones above. This way we 
construct a sequence of elements obeying the orthogonality relationships 

l(^-^i)l<II^IIX-II^IIX < ll^-IITJI^II^ 

< viCj)- 1 ^ < 1/2, 

whenever i < j, by construction of C'k. But then, by the bounds on \\4>i\\, we obtain 
upon expanding the inner product 

2 

+ + <A r || 2 <6- 2 r+ r -^, (2.2) 

for each r < \2S~ 2 ^. On the other hand, the condition (0j, /) > 1 for all i implies 
that the left-hand side of (|2.2I) is at least r 2 . Since this is absurd for r = \2S~ 2 ~\ 
the result follows. □ 

Finally, we also prove the following lemma that will be needed later. 

Lemma 2.4 (cf. |121 Corollary 3.5]). Let £ C 7i be a bounded set and suppose the 
norm 

2l^h/ = EVi.^esL (2.3) 

J=0 j=0 J 

is well defined and equivalent to \\-\\. Then its dual norm is given by ||/|| s = 
su P CT es \(f, °")l- 
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Proof. Given some / £ H it is clear on one hand that 

sup |(/,o-}| < sup \(f,g)\. 

aes ||g|| s <l 

On the other hand, for every e > 0, if g = Ylj=o ^i a i with Ylj=o IAjI < 1 + e ; then 
I </,<?> I < (1 + e)su P(TeE \(f,a)\. The result follows. ^ □ 

3. Norm convergence for systems of finite complexity 

From now on fix a nilpotent group G and a probability space X as in the state- 
ment of Theorem ll.il By a G-sequence we shall mean a sequence {g(n)} n€lj taking 
values in G. An ordered tuple g = (gi, . . . , gj) of G-sequences will be called a sys- 
tem, and for each system one can ask whether the corresponding ergodic averages 

3 

A%[f 1: . . . , fj] := E ne[N] Y[gi(n)fi, (3.1) 

i=l 

converge in £ 2 (X) for every fi, ■ ■ • , fj G L°°{X). Here, for a finite set A we write 
EzeA/tx) := J2xeA f( x ) and for ever y positive integer N it is [N] := {1, . . . , iV}. 
We say two systems are equivalent if they consist of the same G-sequences, so 
for example if g,h are G-sequences then the system (h,g) is equivalent to the 
system (g,h), and so is (g,h,h). Clearly, the convergence of the averages of the 
form l|3.1[) for some system implies the convergence of the averages associated to 
every equivalent system, since T(/i)T(/ 2 ) = T(/i/ 2 ) for every T £ G and /i,/a £ 
L°°(X). 

To each pair of G-sequences g, h we will associate, for each positive integer m, 
the G-sequence 

{g\h)m{n) '■— g(n)g(n + m) _1 /i(n + m), 
and we define the m-reduction of a system g = (gi, . . . , gj) to be the system 

gm = (91, ■ ■ -,9j-l, {gj\^G) m , {9j\gi)m, (9j\9j-l)m), 

where by a slight abuse of notation we write 1q for the G-sequence 1g(«) := 1g- 
where 1q is the identity of G. The main purpose of this section will be to show 
that one can deduce the convergence of the averages l|3.ip for some system g from 
knowing this (actually, the slightly stronger Theorem l3.2l below) for every reduction 
gj!^ of g. This leads us to define the complexity of a system. 

Definition 3.1 (Complexity of a system). We say a system g has complexity if 
it is equivalent to the trivial system (1g) (that is, the system consisting only on the 
sequence 1g). Recursively, we say a system g has complexity d, for some positive 
integer d > 1, if it is not of complexity d' for any < dl < d and it is equivalent 
to some system h for which every reduction has complexity < d — 1. We say a 
system has finite complexity if it has complexity d for some integer d > 0. 

Given a system g = (<?i, . . . ,gj), some set of functions /1, . . . , fj € L°°(X) and 
a pair of integers N, N', write 

•A%,N' [/1 ' ■ ■ ■ ' fjl := [/1 ' ■ ■ ■ ' fj] ~ An [/1 ' ■ ■ ■ ' fj] ■ 
We have the following result. 

Theorem 3.2. Let G and X be as above and let d > 0. Let F : N — > N be some 
non- decreasing function with F(N) > N for all N and let e > be some positive 
real number. Then, for every integer M > 0, there exists a sequence of integers 

M < Ml' F ' d <...< M]f> d < M ' = O d)F)S ,M(l), (3.2) 
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for some K £ ^ — O s ^(l), such that for every system g = (<?i, . . . , gj) of complexity 
at most d and every choice of functions f\, . . . , fj G L°°(X) with WftW^ < 1, there 
exists some 1 < i < K e c i such that 

A e NiN ,[f fj] <e, (3.3) 



for every M-' F ' d <N,N'< F(M-' F ' d ). 



This type of statement already appears in the works of Tao |19) and of Avi- 
gad, Gerhardy and Towsner [lj . Clearly, Theorem 13.21 implies that the averages 
(|3.1|) converge in L 2 (X) for every system g of finite complexity, since otherwise 
one could find some e > and some increasing function F : N — s- N such that 
Afj F ( N )[fii ■ ■ ■ t fd] 2 > £ for every integer N. The usefulness of Theorem l3.2l 

lies on its uniformity over all systems of a fixed complexity, which plays an impor- 
tant role in the inductive argument. In fact, the ergodic averages 1|3.1[) associated 
to a system g for which the reductions g* t do not satisfy stability bounds which are 
uniform on m may not necessarily converge, even if the ergodic averages associated 
to each individual reduction g^ do converge. 

The rest of this section is devoted to the proof of Theorem 13.21 In if4] we will 
show that every system of the form given in Theorem 11.11 has finite complexity, 
thereby completing the proof of that theorem. 



3.1. L-reducible functions. Since Theorem 13.21 is trivially true when d = 0, we 
may proceed by induction. Thus, let d > be some positive integer and assume 
the result holds for every d' < d. Let F and < e < 1 be as in the statement of the 
theorem and let g = (<j>i, . . . , gj) be some system of complexity at most d. Since it 
clearly sufficies to prove the result for any system equivalent to g, by definition of 
the complexity we may assume without lost of generality that g^ has complexity 
< d — 1 for every positive integer m. 

Let C* denote the quantity C S ' V defined in (|2.ip associated to S := £/(2 5 3) and 
r)(x) := £ 2 /(2 3 3 3 ir), so that in particular C* depends only on e. We will sometimes 
use the shorthands IHI^ for \\-\\ L ^ {x) , ||-]] 2 for IMIl^v) and ('> ') for {-r)L^(x)- The 
following definition will be crucial. 

Definition 3.3 (reducible functions). Given a positive integer L, we say a S 
L°°(X), HctII^ < 1, is an L-reducible function (with respect to g), if there exists 
some integer M > and a family bo, b\, . . . , € L°°(X) with H&iH^ < 1, such 
that for every positive integer I < L 

2-1 

gj(l)a - E me[M] ((gj\l G ) m (l)) b Q ({gj\gi) m {l)) h 



< 



16C* 

Reducible functions will play a similar role than the one played by basic anti- 
uniform functions in [T5]. We stress that we do not care for the value of M in 
Definition 13.31 We will show in Lemma 13.41 below that every function giving rise 
to a large average must resemble a reducible function. The main feature of these 
objects is that the role of the G-sequence gj on the averages (13.1[) can essentially be 
recovered by means of the set of G-sequences (gj\lo) m , (gj\gi) m , ■ ■ ■ , (9j\dj-i)m- 

Lemma 3.4 (Weak inverse result for ergodic averages). Assume the inequality 
\\A N [fi, . . . , fj-i,u}\\ 2 > e/6 holds for some WuW^^x^ < 3G, some 1 < G < C* 
and some fx, . . . , fj-x £ L OD (X) with ||/i|| < 1. Then, there exists some constant 
< C\ < 1, depending only on e, such that for every positive integer L < c\N there 
is an L-reducible function a with (u,a) > 2n(C). 
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Proof. We begin by noticing that ||^4^-[/i, • • • , fj-i,u] 



(u, h), where 



3-1 



E, 



ne[N] 



9j(n) 1 A%[fi,...,fj-i,u\Y[9j(n) 1 9i{n)fi. 



(3.4) 



We claim a := h/3C is an L- reducible function for every L < c\N and some 
< ci < 1 depending only on e, from where the result immediately follows since 
by the observation above it is (u,a) > 2rj(C). 

It remains to prove this claim. Write c\ := 96 (g.)S and assume < I < c\N . 
Then, if we shift [N] to [N] + 1 we see that the right hand side of 



changes by 

a magnitude of at most 61C*/N < e/(16C*) (since \\A%[fi, . . . , fj-uu]]^ < 3C < 
3C*) and thus 



3-1 



h -E ne[N] g 3 (l + n) 1 A%[fi,...,f j ^ 1 ,u\Y\_9j(l + n) 1 g i {l + n)f i 

<=i 

Applying gj(l) we get 



< 



16C* 



3-1 



9j(l)h - E„ e[7V] ((gj\l G ) n (l)) A%[h, fj-i,u] J| {{g 3 \gi) n {l)) fi 



< 



16C* 



The claim then follows with M := N, bo := -^AnI/i, • ■ • , fj-it u ] an d bi := fi. □ 

As mentioned early, the advantage of L-reducible functions is that they allow us 
to reduce the study of the ergodic averages of g to the study of averages arising 
from the reductions g^, which we already know to satisfy uniform stability bounds 
by the induction hypothesis. This idea is carried out in the next proposition. 

Proposition 3.5. For every positive integer M* there exists a sequence 

AU < Mi < ... < M k < M* = O tf , MF (l), (3.5) 
depending only on M*, e, d and F , and with K depending only on e and d, such that 
iffi,...Jj-i e L°°(A) with H/ill^ < 1 andf = \ t a t , whereto l A *l < C * 

and each at is an L-reducible function for some L > F(M*), then there exists some 
1 < i < K such that 



-4 AT TV' [/].!•■•! fj-li /] 



L 2 (X) 



< e/4, 



for every pair Mi < N,N' < F(Mi). 

Proof. For every cr t let M^' be the integer coming from the definition of an ir- 
reducible function and let b^ <E L°°(X) be the corresponding family of functions. 
It follows from the definition of an L-reducible function that for every N < L and 
every < t < k — lwe may replace A%[fi, ■ ■ ■ , fj-i, crt] by 



E 



me[M(*)] 



\gi) m (n))b, 



(*) 



at the cost of an L°° error of at most e/(16C*). Therefore, we get by Minkowski's 



inequality that for N,N' < L, 



is bounded by 



.e[M(*)] 



st=0 



A Sm 



Jl, ■ ■ ■ ! Jj-l, »o )°1 )• 



^1 



■e/8. (3.6) 



We are thus given a large family of averages coming from the lower complexity 
systems g^. Write 7 := W E C ,_ . Clearly, it would suffice to find a suitable interval 



NORM CONVERGENCE OF NILPOTENT ERGODIC AVERAGES 



8 



on which each of this lower dimensional averages is bounded by 7. Although this 
will not be possible, we will indeed show by repeated applications of the induction 
hypothesis that we can get such a bound for all but a negligible subset of these 
averages. In order to do this, consider non-decreasing functions Fx, . . . ,F r : N — > N, 
for some r = £t d{^) to be specified, defined recursively by F r := F and Fi-i(N) := 
maxi<M<Ar Fi(M' y,Fi,d ~ 1 ), where we are using the notation in the statement of 
Theorem 13.21 Also, let A := K 1 ,d-i be as in that theorem and for each tuple 
1 < ii, . . . , i s < K, s < r, and integer M, we define recursively 



f,F 2 ,d-l 
'2 



7,F a ,d-l 



Thus, is the integer M^' Fl ' d 1 obtained in (j3~2]) by starting at M, M {il ^ is 

the integer M^ F2,d ~ 1 obtained by starting the sequence p.2[) at M = M^ %1 \ etc. 
In particular, notice that this sequence depends only on e,F,d and M. Observe 
also that since each of the averages in (|3.6p satisfies 



"™N,N' 



A,... b®,b?\..., bf^ (3.7) 



the sum on (|3.6[) is bounded by 2t=o M — C 



We now proceed as follows. By the induction hypothesis we know that each of the 
reduced averages in (|3.6p is bounded by 7 for every pair 

and some 1 < i < A", which depends on the particular average. By the pigeonhole 
principle and (|3.7[) . this implies that we may find some 1 < i\ < K such that 
the contribution to (|3.6[) of those averages which are not bounded by 7 for every 
pair N, N' e [Afi ll) , F 1 (M<i ll) )} is at most (^i) C* . We now apply the induc- 
tion hypothesis to these remaining averages with the function Fi, the parameter 
7 and the starting point Af* . This way, for each of these remaining averages, 
we know that there exists some 1 < i < K such that the average is bounded 
by 7 for every pair N, N' <E [M^ ut \ F 2 {M^ u ' l) )}. Since by construction of F\ it 
is [Mi iui) ,F 2 {Ml iui) )] C [Mi h) ,Fi{Mi h) )], we see that those averages which we 
bounded in the previous step remain bounded by 7 on each of these new intervals. 
Thus, we may apply the pigeonhole principle as before to find some 1 < i% < K 
such that the contribution to (|3.6[) of those averages which are not bounded by 7 
for every pair N, N' G [M* (ll ' 42) , F 2 {m1 11 M) )] is at most {^-f C* . 

Iterating the above process r times, we find a tuple 1 < ii, . . . , i r < K such that 
the set of reduced averages which are not bounded by 7 for every pair N, N' £ 
[Mi il! -' ir) ,F r (M* (il '-' ir) )] = [Afi il '- ,<r) ,F(Afi il, ™' ir) )], contributes at most 



K - 1 
K 



C* < e/16, 



to (|3.6[) . upon choosing r sufficiently large in terms of e and d. Since the sum 



over the remaining terms will be bounded by X)t=o I^*It < we conclude 

that (13.61). and therefore 



■^N,N'[h,----,fj-uf] . is bounded by e/4, for every 

N, N' £ [Mi il '-' ir) ,F(M| il, -' ir) )]. Notice that while the specific integer M^'-' ir) 
we have obtained depends on the set of functions /1, . . . , fj—i, f and the system g, 

this integer belongs to the sequence h-.----.jr < K, which depends 

only on F,e,d and M*. The result follows from this observation with the sequence 
(|3~5l) given by the integers M^' 1 '"' ,ir) , 1 < ji, . . . ,> < A. □ 
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3.2. Proof of Theorem 3.2. We can now conclude the proof of Theorem 13.21 As 
it was done before, we fix X,G,F,e,d and g as in the statement of that theorem 
and assume without lost of generality that each reduction g* s of g is of complexity 
at most d — 1 and that the result is already proven for every d' < d. We will 
also write Mo for the integer M to be chosen as the starting point of the sequence 
(13. 2[) in Theorem 13.21 Let S, rj be as specified at the beginning of iJ3.ll and write 
Ci := C^ 1 for the constants defined in (I2.ip . Given some positive integer L write 
Hl for the set of L-reducible functions and set 

where we write B 2 (S/C*) for the set of / G L 2 {X) with ||/|| 2 < 5/C*. Consider 
on L 2 (X) the norms ||-|| L := ||-|| s + defined as in (|2.3[) . It is easy to see that these 
norms are well defined and equivalent to 1 1 - 1 1 ^ 2 ( J*c ) 0-^ * ne P resence of the small 
L 2 ball and the fact that reducible functions are bounded by 1). Also, notice that 
C for every L which in turn implies (by Lemma \2A\i that |H|^ +1 < IHIjr- 

Given any integer M write if)(M) := F(M*) where M* is the integer obtained 
in Proposition [33] with M* = M. Let fi,...,fj G L°°(X), WM^ < 1, be given 
and consider for fj a decomposition of the form provided in Proposition 12.31 with 
(II'IIl)lgN' V 1 !^ 7 ? as above and c equal to the constant c\ in Lemma 13.41 This 
allows us to find a constant 1 < Ci < C* and some integer M with Mq < M = 
OM ,e,F,d(l), such that 

fc-i 

fi = *t<Jt +u + v, (3.8) 

4=0 

where J2t=o M — ^>> eac ^ a t belongs to for some B > ip{M) (and therefore 
to S^^ f j), < T)(Ci) for some A < c%M and ||u|| 2 < 8. We remark that this 

constant Ci is the one defined in (|2.1j) and that the integer M obtained belongs 
to the sequence given in Proposition 12. 3[ which does not depend on the family of 
norms (||'|li) ieN an d in the present case is therefore independent of the particular 
system g we have fixed (although it certainly depends on its complexity d, as well 
as on e,F and Mq). Since \\Y^t ^t&t |L < S, where the sum is restricted to those 
0t G B2(S/C*), we may assume that each cr t in (|3.8[) actually belongs to S^,(m)) & t 
the cost of softening our bound on v to ||u|| 2 < 25. 

We would like to use Lemma l3.4l to study the function u, but first wc need to gain 
some control on its L°° norm. In order to do this, denote by S 6 X the set of points 
on which the inequality \v(s)\ < Ci holds (in particular one has /i(S c ) < (26 /Ci) 2 ) 
and write v' :— ulgc + v. From the fact that \\<jj H^oo^) — ^ f° r ever y a j G ^ip(M), 
(|3.8[) and the definition of S, one easily checks that \uls<={x)\ < 3\v(x)\ a.e. and 
therefore ||m1s c II2 ^ 3 ||«|| 2 < Hence, it follows that for every pair of integers N, N', 

\\A N ,N'[fi, ■ ■ .,fj-x,v']\\ 2 < \\A N '[fi, ■ ■ .,fj-i,v']\\ 2 + \\A N [fi,-- ■ ,fj-i,v']\\ 2 

<2(4|M| 2 ) 

< e/3, (3.9) 

where we are using Minkowski's inequality and the fact that H/iH^ < 1 for every 
1 < i < j — 1. Consider now uls- Similarly as above, one sees that ||wls|| i00 (^;) < 
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3C,;. Also, it follows from Lemma T2.4I that for every a € XU it is 
\{u 1Sl a)\ < \{u,a)\ + \(u ls .,a ls .)\ 

< ||«||* +||Kl S o|| 2 ||(7l S c|| 2 

< 77(a) + m 2 /c l 

< 2r,(Ci). 



We are now in a position to apply Lemma 13.41 which implies that for every pair 
N, N' > M 

\\A N ,N'[h,-- .,fj-i,ul s ]\\ 2 < ||^ljv'[/i)-- •./i-i) wl s]|| 2 + II-^jv[/i,- • .,/j-_i,wls]|| 2 

<e/3. (3.10) 

It only remains to analyze X^tLo 1 ^t&t- But we may now invoke Proposition 13.51 to 
conclude from our choice of ip that 

fc-i 

An,n'[A, ■ ■ ■ , /j-i) / , A t cr t ] 



t=o 



<e/3, (3.11) 

for every pair Mi < N, N' < F(Mi) and some Mi e [M, ?^(M)] which belongs 
to the corresponding sequence (I3.5[) . Theorem 13.21 then follows from (|3.8[) . (|A.1|) . 
(IXTU)) . (|3TTT1) and Minkowski's inequality. 

4. The complexity of polynomial systems 

In this section we will prove that every system of the form given in Theorem 
11.11 has finite complexity, thereby finishing the proof of that theorem. In order to 
do this, we begin by reviewing some facts about polynomial sequences in nilpotent 
groups. For a detailed treatment of this topic, the reader is referred to the work of 
Leibman Q21Q2]. 

For a G-sequence g — (g(n)) neZ taking values in a nilpotent group G and some 
integer to, we define the operator D m which takes g to the G-sequence (D rn g)(n) := 
g(n)g(n + m) _1 . In particular, we have (g\h) m (n) = (D m g)(n)h(n + to), for every 
pair of G-sequences g, h and every positive integer to. We say that a G-sequence 
is polynomial if there exists some positive integer d such that for every choice of 
integers mi, ... , nid, we have D mi . . . D md g = 1q, where we recall that 1q stands for 
the constant sequence which equals the identity of G. It is known that if (g(n)) nGZ 
is a sequence in a nilpotent group G which is of the form 

g (n) =Tf l(n) ...Tf (n) , (4.1) 

for some T\, . . . , Tf. E G and some set of integer valued polynomials pi, . . . ,pk, then 
g is a polynomial sequence. Indeed, each T?*^ is clearly a polynomial sequence 
and the product of polynomial sequences is polynomial by Lemma |4. II below (the 
converse also holds, see for example [16] 1. 

By a polynomial system we shall mean a system g = (g\, . . . , 5j), where each 
9i, 1 < i < J-, is a polynomial sequence. We define the size of such a system to be 
|g| = j. To prove Theorem 11.11 it will suffice, by Theorem 13.21 and the fact that 
sequences of the form (|4.ip are polynomial, to prove that every polynomial system 
has finite complexity. 

In order to proceed, we will need to define the degree of a polynomial sequence. 
Unfortunately, the natural choice of taking the least positive integer d for which 
every d successive application of the above operators returns the identity is not 
appropriate for our purposes, since with this definition the set of polynomial se- 
quences of degree < d need not form a group. In order to amend this, we need to 
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introduce some notation. Write N = N U {0} and N* = No U {— oo}. We say a 
vector d = (di, ■ ■ . , dk) S is superadditive if di < dj+i for every 1 < i < k and 
di + dj < di + j for every pair i, j, where we are using the conventions — oo + t = — oo 
for every t € N* and — oo < r for every r 6 No- Also, given a superadditive vector 
d = (di, . . . , dk) and some nonnegative integer t, we write d — t := (d' l7 . . . , d' k ), 
where d\ = di — t if t < and d^ = — oo otherwise. Notice that d — t so defined is 
also a superadditive vector. 

Fix a nilpotent group G of nilpotency class s and let G — G\ D G2 D ■ . ■ D 
G s D G s+ i = {1g} be its lower central series. As in [TBJ[T7], we say a sequence 
g = (g(n)) ne % taking values in G is a polynomial sequence of (vector) degree < 
(di,..., d s ) if (D mi . . . D mdk+l g)(n) e G k +i for every n, every 1 < k < s and every 
choice of mi, . . . , G Z. If c?fc = —00 we take this to mean that g itself takes 

values in Gk+i- We will make use of the following results of Leibman. 

Lemma 4.1 ( |171 §3]). Let d = (d±,...,d s ) be a superadditive vector and let 
t,t±,t2 > be nonnegative integers. Then we have the following properties. 

(1) If g is a polynomial sequence of degree < d — t, then D m g is a polynomial 
sequence of degree < d — (t + 1), for every meZ. 

(2) The set of polynomial sequences of degree < d — t forms a group. 

(3) If g is a polynomial sequence of degree < d — t\ and h is a polynomial 
sequence of degree < d — <2 7 then [g,h] is a polynomial sequence of degree 
< d — (ti + t2), where [<?, h](n) := g^ 1 (n)^ 1 (n)g(n)h(n) . 

Remark. The results of [T7] concern the operators (D m g)(n) := g(rt) _1 5(rt + to) = 
(D m g _1 )(n). Nevertheless, using Lemma |4~T1 for these operators and a straightfor- 
ward descending induction on t one can easily check that a G-sequence g has degree 
< d — t with respect to the operators D m if and only if it has degree < d — t with 
respect to the operators D m , from where we recover Lemma 14. II as stated. 

We say a polynomial system g = (gi,...,gj) has degree < d if the degree of 
gi is < d, for every 1 < i < j. We will show that any system of degree < d, 
for some superadditive vector d = (d\, . . . , d s ), has finite complexity. Notice that 
this is enough to prove Theorem ll.il since if a polynomial sequence g has degree < 
(di, . . . , d a ), then it also has degree < (d, 2d, . . . , sd), with d = max{di : 1 < i < s}, 
and this last vector is clearly superadditive. 

Given a polynomial system g, we are concerned with the process that consists 
of passing from g to an equivalent system g', then taking the TO-reduction (g')m 
of g' for some m, passing to an equivalent system ((g')m)' an d then taking the 
rn'-reduction of this for some to', etc. What we are free to choose in the above 
process is to which equivalent system we apply the reductions (but not the integer 
on which we subsequently reduce) and our objective is to show that there exists 
some constant C, depending on g, such that for every sequence of positive integers 
to, to', m", . . . we can go to the trivial system (1g) by means of at most C repetitions 
of the above transformations. This clearly implies that the complexity of g is at 
most C. 

In order to simplify notation we will omit the reference to the specific sequence 
of integers on which we reduce. So for instance, we will generically refer to the 
reduction of a system g = (gi , . . . , gj) to be the system 

g* = (51, • ■ -,9j-i, felic), (gj\gi), (gj\gj-i))- 

Similarly, we have the identity 
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provided, of course, that the omitted subindices are the same. We define a step to 
be the process of passing from a system g to the reduction (g')* of some system 
g' equivalent to g. We will show that one can pass from a polynomial system g to 
the trivial system in a number of steps which is bounded in terms of the size and 
degree of g. 

We define the complete reduction of a system g to be the system 

g** = Oi, ■ • -,9j-i, (gAgi), ■ • ■ , (ftla-i))- 

Thus, g** = g*\{{gj\lo)}- We define a complete step in the same way as a step, but 
with the reduction replaced by the complete reduction. Complete steps are needed 
for a technical reason related to the inductive process to be applied. Precisely, in 
order to handle steps involving systems of degree < d we will need to assume some 
control on both steps and complete steps over systems of degree < d — 1 . 

Theorem 11.11 follows from Theorem 13.21 and the following result. 

Theorem 4.2. Let g be a polynomial system of size |g| < C± and degree < d, for 
some super additive vector d = (d±, . . . , d s ). Then, 

• one can go from g to the trivial system (1g) in Ci ^(1) steps, 

• one can go from g to a system consisting of a single sequence of degree < d 
in O c ^(1) complete steps, 

for every sequence of positive integers m, mf, to", .... In particular, g has complexity 

Proof. Let d be as in the statement. We begin by noticing that the result is trivially 
true for systems of degree < d — (d s + 1) = (— oo, . . . , -co), since 1q is the only 
sequence lying in Gk+i for every 1 < k < s. We will proceed by induction. Since 
d— t is superadditive for every t > 0, it will suffice to prove that if Theorem 14.21 
holds for systems of degree < d — 1 then it also holds for systems of degree < d. 

Thus, let g be as in the statement. We will first prove that we can go from g to 
the trivial system in Ci ^(1) steps (and therefore, that g has complexity Ci ^(1))- 
In order to do this, observe that g can be rewritten in the form 

i 

g = h U|J Sl h,;, (4.2) 

i=l 

for some polynomial sequences s±, . . . , s; of degree < d, I < C\, and some polynomial 
systems hj of degree < d — 1 and size < C\, with ho possibly empty (for example, 
one could simply take Si — gi and hi = (1g) for every 1 < i < I). Here, if 
h = (hi, . . . , hk), sh is the system (sh\, . . . , shk) and the union of two systems 
(hi, . . . , hk), (h[, . . . , h' r ) is understood to be the system (hi, . . . , hk, h' x , . . . , h' r ). 

The idea will be to show that for systems of the form (|4.2p one can perform 
steps in such a way that the resulting systems are also of the form (|4.2p for the 
same set of sequences Si, . . . , s;. Furthermore, we will show that in finitely many 
steps we may actually discard the sequence Si, therefore arriving at a system like 
(|4.2[) in which only the sequences Si, . . . , s/_i are present. Iterating this / times we 
shall then end up with a system of degree < d — 1, from where one can proceed by 
induction. 
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In order to carry out this plan we begin by observing that if Sj, Sj are sequences 
of degree < d and hi, hj are sequences of degree < d— 1, we have 

(sjhj\sihi) = D(sjhj)(D(s i hi))~ 1 Sihi 

= s.Disjh^Dis.hi))- 1 [Disjh^Dis.hi))- 1 ^.,} hi 

= Sih j '\ (4.3) 

for some polynomial sequence h h% which is seen to have degree < d — 1 by Lemma 
14.11 Furthermore, if Si = Sj = s, it is easy to check that 

{shj\shi) = s(hj\hi). 

It follows from these formulas that, provided |h;| > 1, the reduction g* of g is 
equivalent to a system of the form 

h«U^|J Si hf^ Usjhr, (4.4) 

for some systems Iiq , , . . . , hj_ 2 of degree < d— 1 and size |hg | < 2|ho| + 1 and 

h^| < 2\hi\ for every other i, and where we recall that h ; ** refers to the complete 
reduction of h/. Explicitly, if hj = . . . , for every < i < I, then 

h o = ((si^/jjlc), Vi> ■ ■ -^0,JO! (s^uJVi)' •••) ( s l h l,ji\ h Q,jo)) . 

while Sih^ equals 

(sj/i,-,!, . . . , Sihi tji , (sihi,j, \sihi t i), . . . , (s/Zi/j, Isj/ijjj)) , 

for every 1 < i < i — 1. We see by (|4.3[) that this is of the desired form. 

Observe now that if h is equivalent to h' then the system sh is also equivalent 
to sh'. Since by the induction hypothesis we know that one can pass from h; to a 
system h consisting of a single sequence of degree < d — 1 in Ci ^_ 1 (1) complete 
steps, it follows from the above observation and (|4.4[) that we may pass from g to 
a system of the form 

, / I J . i 

\i=l / 

in Ci steps, where each system h- ; has degree < d — 1 and size Ci ^(1), 

and h is a system consisting of a single sequence of degree < d — 1. But then we 
see from (|4.3I) that the reduction of (|4.5[) will be of the form 

with each h| 3 ^ having degree < d— 1 and size Ci ^j(l). We have therefore succeeded 
in discarding the sequence s; from our system. We can now repeat the same process 
as before with s/_i in place of s/. Since the size of h^_j is Ci ^(1), we see that 
this new process finishes in Ci ^(1) steps, leaving us with a system of the form 

h^U^ (4) ). 

Therefore, iterating the above process I times, we are finally left in Ci ^(1) steps 
with a system of degree < d — 1 from where we may apply the induction hypothesis 
to obtain the trivial system in Ci g(l) further steps, thereby completing the proof 
of the finite complexity of g. 



h< 2) U |J Si hf ) U Sl h, (4.5) 



NORM CONVERGENCE OF NILPOTENT ERGODIC AVERAGES 



14 



Now it only remains to show that one can pass from g to a system consisting of 
a single sequence of degree < d in O c ^(1) complete steps. But it is clear that the 
above reasoning to pass from g to a system of degree < d — 1 works in exactly the 
same way for complete steps, since the only things that may change are the systems 
hg 2 \ hp 3 \ . . ., which nevertheless will always have degree < d — 1 and whose 
size may only be smaller than in the previous case. Thus, the above reasoning 
allows us to pass to a system of degree < d — 1 from where we may apply induction, 
as long as we are not left after any of the complete steps with a system which can 
be written in its entirety as s^h, for some Si as above and h of degree < d — 1 and 
size |h| = 1 (because if the whole system has size 1 the complete reduction is not 
defined). But since in such a case we are already done, this completes the proof of 
Theorem 14.21 and therefore of Theorem 11.11 □ 



5. Further results 

The next result is easily seen to follow from the methods of this paper. 

Theorem 5.1. Let G be a nilpotent group of measure preserving transforma- 
tions of a probability space (A, X,/i). Then, for every Ti,...,T; G G, every 
flf-ifr € L°°(X), every set of polynomials pij : Z d — > Z, and every F0lner 
sequence {$v}^/ =1 in Z d , the averages 



1 



1$ 



v| 



£ f[( T * lAu) ...Tp'^Afc (5.1) 



u£<S> N j = l 



converge in L 2 (X, X, /i). 



During the proof of Theorem 11.11 we used crucially the fact that the L°° norm 
is an algebra norm (H/sH^ < ||/Hoo llfflloo)' While this is clearly not true for the 
L 2 norm, if we are concerned with the study of a single function / £ L 2 (X), this 
issue is no longer present. Furthermore, in this case our polynomial systems will 
always have size 1, a fact that allows us to drop the hypothesis of nilpotency on 
our group G (because we no longer need the product of polynomial sequences to 
be polynomial). More generally, it is easy to see from these observations that our 
methods produce the following result, which was also conjectured by Bergelson and 
Leibman in [BJ. 

Theorem 5.2. Let G be a group of unitary operators on a Hilbert space %. If 
(g{n)) n< =z is a polynomial sequence in G, then 



1 N 
lim — > q(n)u 



AT->oo N 

exists for every u G T-L. 

If the group G is assumed to be nilpotent this provides an alternative proof of 
a result already established by Bergelson and Leibman in [BJ . Also, it is important 
to notice that it is not presently known whether if by dropping the nilpotency 
assumption on the group one is in fact obtaining a more general result. 



Appendix A. Some examples of reductions 

We now provide some concrete examples of how the process studied in Section 
U] returns the trivial system for some polynomial systems. Given systems g and h 
we write g ~ h to mean that both systems are equivalent and we write g —> h to 
mean that h is the m-reduction of g. Given measure preserving transformations 
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T, S, C of a probability space, we will use the convention of writing T n S n C for the 
G-sequence g given by g(n) — T n S n C. 

Before giving the examples, we note that an unpleasant feature of the process 
described in the body of the paper is that in the simplest cases it is unnecessarily 
complicated (mainly, this happens when the group is abelian). A way to amend 
this is the trivial observation that the averages fl3.1[) for sequences gi(n) = hi(n)Ci, 
with Ci some set of transformations, equal the averages associated to the system 
h = (hi, . . . , hj) evaluated at the functions Cifi. Thus, we may extend the previous 
equivalence relation to include those pairs of systems which may be obtained from 
each other by adding or removing a constant in the above manner. It is easy to 
check that the arguments of $31 work equally well with this alternative notion of 
equivalence. We write g ^* h if g and h are equivalent in this way and call this 
modification of the process 'cheating'. While there is no substantial gain by using 
this slight modification in the general case, many of the examples discussed below 
become much cleaner in this way. Nevertheless, we will also show in every case how 
the process is performed without cheating. 

Example A.l. As a trivial example, suppose our system is constant i.e. of the 
form (Ci, . . . , Cj) for some constant G-sequences Gi, . . . , Cj. Then its m- reduction 
is equivalent to (1g, Gi, . . . , Gj_i) for every m, so in particular we get the triv- 
ial system after at most j steps. Of course, if one is allowed to cheat, one has 
(Gi, . . . , Cj) ~* (1 G ) to begin with. 

Example A. 2. Suppose we are given a linear system (L™Gi, . . . , L™Cj) consisting 
of commuting transformations Li, Gi, . . . , Lj, Cj. The m-reduction of this system 
is given by 

(L?G l5 . . . , I$_ 1 C j -i,Lj m ,Lj m Lp m C li . . . , Lj m L^™Cj^) (A.l) 

This highlights the advantage of cheating. Indeed, this would allow us to go to 
the trivial system in j steps, while without cheating we would require more than 
2 tt j steps. We now see how the latter is accomplished. Our objective is to 
eliminate L 7 J l _ 1 from the reduction (|A.1[) (cf. the general strategy given in the proof 
of Theorem 14.21) . In general, if we are given a system of the form 

^1,1) • • • i - L 'l Ll l,ti) • • • ) ■ L 'k U k,lj ■ ■ ■ j ^fc^MfeJ) 

with the transformations generating an abelian group, its m-reduction will be equiv- 
alent to 

( t —m t ns~i j n s~i j n s~i j —m r rn T n t~* t —m j m 

\ fc ' -"1 ^1,1) • • • ' -^l J -'l [ ^1,1- L 'k 1 ' " ' " ' 1 °Mi 1 '•'•! 

r n y— i j n r~i T n 1 — m T m T n r —m r m 

-kfc-l^fc-M' • • • ! ^fc-l°fc-l,i*-i i ^fe-l^fc-l.l-kfc l^ k _i, . . . , iv fc _ 1 Ofc_i ! i fc _ 1 i^ fc i^fe_X) 

j n /— i t n/~i \ 

-^fc°fc,li • • • ) ^k^k,ih-l)- 

In particular, L™ appears twice as many times as before for every 1 < i < k, 
while L£ appears one time less, therefore disappearing after ik steps. Notice also 
that at each step we get twice plus one as many constant sequences as before. 
Applying these observations we see that the system (L™Gi, . . . ,L™Cj) reduces to 
one consisting only of constant sequences after at most a(j) steps, with a : N — > N 
the function recursively defined by a(l) — 1 and a(n + 1) = a(n) + 2 a ( n \ We may 
then proceed as in Example IA.1I 
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Example A.3. Consider now a system of the form (Sf G x , . . . , Sf_ t Cj-_i , T" 2 Sf Cj ) 
for commuting T, Sj, G*. The m- reduction is given by 

/nnrr on ^ - rp—2mn — m 2 ct — m rp—2mn — m 2 Q — mQn-\-Tny~t 

{D x U\, . . . , Oj^L/j—i, X ,1 S ± Oi,..., 

which is a linear system and therefore reduces to the trivial system by the procedure 
discussed in Example IA. 21 

Example A. 4. If we are given a system of size 1 consisting of a polynomial G- 
sequence g then it is obvious that the number of steps required to reach the trivial 
system is the total degree of g. Notice that this is true even when the group G is 
not assumed to be nilpotent. 

Example A. 5. Consider (T n ,T n S n ) for commuting T and S. The m-reduction 
is given by 

— > (T™ 2 J 1_ 2mn-m 2 ^-m rp—2ln — l 2 rj-in 2 rp — 2ln — l 2 —2m,(n+l) — rn 2 Qj — rn^ 

and this is equivalent to a system of the form studied in Example IA.3I 

Example A. 6. Consider (T™, S n ) with T and S generating a nilpotent group. The 
mi-reduction is given by 

(T n S~~ mi c[~~ ni i r p ni i r p n '^ 
Write C := S- mi T mi , := S~ mi . Then the m 2 -reduction of this is given by 

~ (C (1) , C (2) , C (3) , T n , [C~\T m2 ]T n ). 

for some constant G-sequences C( 2 ),C( 3 ) which depend on mi,TO2- By the same 
reasoning we see that after reducing at to 3 , . . . ,mi (and passing to equivalent sys- 
tems) we get the system 

(C (1) , C (2) , . . . , c W) \T n , [[[[C- 1 ,T m2 ]- 1 ,T m3 ]- 1 , . . .}-\T m ']T n ), 

for some constant G-sequences C^ % \ 1 < i < c{l), with c : N — > N the increasing 
function defined recursively by c(l) = 1 and c(n + 1) = 2c(n) + 1. Clearly, this 
is equivalent to (C (0) ,C (1 \. . . , C W) \T n ) for some I of size at most s + 1, with s 
the nilpotency class of the group. Since any reduction of this last system will be a 
constant system of size c(l + 1) it follows that our original system (S n ,T n ) reduces 
to the trivial one in at most s + 2 + c(s + 1) steps. Of course, s + 2 steps would 
have sufficed if we were allowed to cheat. 

Example A. 7. Our last example is the system (T™ ,S n ) for commuting T and 
S. We have 

^^n 2 C^ n2 ^ m j (T 71,2 ^-2)imi-m 2 <^—2nmi—rn 2 rj-i7i 2 rj-i2nrni+m 2 ^ 
^ ^C< — 2nmi—m 2 rj-m 2 g-~2n7ni—rn 2 rpn 2 rp2nmi+m 2 ^ 
m ^ |^-2nmi-m 2 rpn 2 r£ — 2nrri2- ni 2 — 2mim2 ^2mim2 
/j^ — 2nm2— m^— 2mim2 g—2nmi~m 2 rpn 2 ^2mim2 fji—2mi17l2 \ 
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m <3 fCf~ 2nmi — rpn 2 2 1 -2njri2-m2-2miJH2^2mim2 
r p— 2ni7i2 — Tn\ — 2mi7Ti2 t^—2nm\—m\ rp — 2nms~m 2 } 
rp—2nm3 —m\ ^-2nmi-2mim3 —m\ <j-m 2 
rp — 2nrri3 — m 2 ; — 2nrri2 — 2m2TH3 — — 2mim2 ^2mim2 
^ — 2nm3— m^— 2nm2— 2m2m3— m|— 2mim2 ^ — 2nra\—2ra\m^—rn\ \ 

and this last system is equivalent to one of the form studied in Example IA.3I 
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